{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "It89APiAtTUF"
   },
   "source": [
    "# Create meeting minutes from an Audio file\n",
    "\n",
    "I downloaded some Denver City Council meeting minutes and selected a portion of the meeting for us to transcribe. You can download it here:  \n",
    "https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n",
    "\n",
    "If you'd rather work with the original data, the HuggingFace dataset is [here](https://huggingface.co/datasets/huuuyeah/meetingbank) and the audio can be downloaded [here](https://huggingface.co/datasets/huuuyeah/MeetingBank_Audio/tree/main).\n",
    "\n",
    "The goal of this product is to use the Audio to generate meeting minutes, including actions.\n",
    "\n",
    "For this project, you can either use the Denver meeting minutes, or you can record something of your own!\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "sJPSCwPX3MOV"
   },
   "source": [
    "## Again - please note: 2 important pro-tips for using Colab:\n",
    "\n",
    "**Pro-tip 1:**\n",
    "\n",
    "The top of every colab has some pip installs. You may receive errors from pip when you run this, such as:\n",
    "\n",
    "> gcsfs 2025.3.2 requires fsspec==2025.3.2, but you have fsspec 2025.3.0 which is incompatible.\n",
    "\n",
    "These pip compatibility errors can be safely ignored; and while it's tempting to try to fix them by changing version numbers, that will actually introduce real problems!\n",
    "\n",
    "**Pro-tip 2:**\n",
    "\n",
    "In the middle of running a Colab, you might get an error like this:\n",
    "\n",
    "> Runtime error: CUDA is required but not available for bitsandbytes. Please consider installing [...]\n",
    "\n",
    "This is a super-misleading error message! Please don't try changing versions of packages...\n",
    "\n",
    "This actually happens because Google has switched out your Colab runtime, perhaps because Google Colab was too busy. The solution is:\n",
    "\n",
    "1. Kernel menu >> Disconnect and delete runtime\n",
    "2. Reload the colab from fresh and Edit menu >> Clear All Outputs\n",
    "3. Connect to a new T4 using the button at the top right\n",
    "4. Select \"View resources\" from the menu on the top right to confirm you have a GPU\n",
    "5. Rerun the cells in the colab, from the top down, starting with the pip installs\n",
    "\n",
    "And all should work great - otherwise, ask me!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "f2vvgnFpHpID"
   },
   "outputs": [],
   "source": [
    "!pip install -q --upgrade torch==2.5.1+cu124 torchvision==0.20.1+cu124 torchaudio==2.5.1+cu124 --index-url https://download.pytorch.org/whl/cu124\n",
    "!pip install -q requests bitsandbytes==0.46.0 transformers==4.48.3 accelerate==1.3.0 openai"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "FW8nl3XRFrz0"
   },
   "outputs": [],
   "source": [
    "# imports\n",
    "\n",
    "import os\n",
    "import requests\n",
    "from IPython.display import Markdown, display, update_display\n",
    "from openai import OpenAI\n",
    "from google.colab import drive\n",
    "from huggingface_hub import login\n",
    "from google.colab import userdata\n",
    "from transformers import AutoTokenizer, AutoModelForCausalLM, TextStreamer, BitsAndBytesConfig\n",
    "import torch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "q3D1_T0uG_Qh"
   },
   "outputs": [],
   "source": [
    "# Constants\n",
    "\n",
    "AUDIO_MODEL = \"whisper-1\"\n",
    "LLAMA = \"meta-llama/Meta-Llama-3.1-8B-Instruct\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "Es9GkQ0FGCMt"
   },
   "outputs": [],
   "source": [
    "# New capability - connect this Colab to my Google Drive\n",
    "# See immediately below this for instructions to obtain denver_extract.mp3\n",
    "\n",
    "drive.mount(\"/content/drive\")\n",
    "audio_filename = \"/content/drive/MyDrive/llms/denver_extract.mp3\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HTl3mcjyzIEE"
   },
   "source": [
    "# Download denver_extract.mp3\n",
    "\n",
    "You can either use the same file as me, the extract from Denver city council minutes, or you can try your own..\n",
    "\n",
    "If you want to use the same as me, then please download my extract here, and put this on your Google Drive:  \n",
    "https://drive.google.com/file/d/1N_kpSojRR5RYzupz6nqM8hMSoEF_R7pU/view?usp=sharing\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xYW8kQYtF-3L"
   },
   "outputs": [],
   "source": [
    "# Sign in to HuggingFace Hub\n",
    "\n",
    "hf_token = userdata.get('HF_TOKEN')\n",
    "login(hf_token, add_to_git_credential=True)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "qP6OB2OeGC2C"
   },
   "outputs": [],
   "source": [
    "# Sign in to OpenAI using Secrets in Colab\n",
    "\n",
    "openai_api_key = userdata.get('OPENAI_API_KEY')\n",
    "openai = OpenAI(api_key=openai_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "GMShdVGlGGr4"
   },
   "outputs": [],
   "source": [
    "# Use the Whisper OpenAI model to convert the Audio to Text\n",
    "# If you'd prefer to use an Open Source model, class student Youssef has contributed an open source version\n",
    "# which I've added to the bottom of this colab\n",
    "\n",
    "audio_file = open(audio_filename, \"rb\")\n",
    "transcription = openai.audio.transcriptions.create(model=AUDIO_MODEL, file=audio_file, response_format=\"text\")\n",
    "print(transcription)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "piEMmcSfMH-O"
   },
   "outputs": [],
   "source": [
    "system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n",
    "user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n",
    "\n",
    "messages = [\n",
    "    {\"role\": \"system\", \"content\": system_message},\n",
    "    {\"role\": \"user\", \"content\": user_prompt}\n",
    "  ]\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "UcRKUgcxMew6"
   },
   "outputs": [],
   "source": [
    "quant_config = BitsAndBytesConfig(\n",
    "    load_in_4bit=True,\n",
    "    bnb_4bit_use_double_quant=True,\n",
    "    bnb_4bit_compute_dtype=torch.bfloat16,\n",
    "    bnb_4bit_quant_type=\"nf4\"\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "6CujZRAgMimy"
   },
   "outputs": [],
   "source": [
    "tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n",
    "tokenizer.pad_token = tokenizer.eos_token\n",
    "# inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
    "streamer = TextStreamer(tokenizer)\n",
    "model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config, trust_remote_code=True)\n",
    "# outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "MaLNmJ5PSqcH"
   },
   "outputs": [],
   "source": [
    "inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
    "outputs = model.generate(inputs, max_new_tokens=2000, streamer=streamer)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "102tdU_3Peam"
   },
   "outputs": [],
   "source": [
    "response = tokenizer.decode(outputs[0])\n",
    "response"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "KlomN6CwMdoN"
   },
   "outputs": [],
   "source": [
    "display(Markdown(response))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0jZElVOMSPAr"
   },
   "source": [
    "Day5 exercise - Gradio UI for meeting minutes"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "5iiYYxQMHf0i"
   },
   "outputs": [],
   "source": [
    "import gradio as gr\n",
    "import tempfile\n",
    "import soundfile as sf"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "aGwXW7BjPcTM"
   },
   "outputs": [],
   "source": [
    "# !pip install pydub\n",
    "# !apt-get install ffmpeg\n",
    "\n",
    "from pydub import AudioSegment"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "RNu-reHuCYj_"
   },
   "outputs": [],
   "source": [
    "# Make sure that the tokenizeer and model is already generated\n",
    "\n",
    "# tokenizer = AutoTokenizer.from_pretrained(LLAMA)\n",
    "# tokenizer.pad_token = tokenizer.eos_token\n",
    "# streamer = TextStreamer(tokenizer)\n",
    "# model = AutoModelForCausalLM.from_pretrained(LLAMA, device_map=\"auto\", quantization_config=quant_config)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "KOuoH0YOPruE"
   },
   "outputs": [],
   "source": [
    "# def save_as_mp3(audio_np):\n",
    "#     sr, data = audio_np\n",
    "#     # Convert float32 or int16 to PCM wav and then mp3\n",
    "#     wav_path = tempfile.NamedTemporaryFile(suffix=\".wav\", delete=False).name\n",
    "#     mp3_path = tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False).name\n",
    "\n",
    "#     sf.write(wav_path, data, sr)\n",
    "#     audio_segment = AudioSegment.from_wav(wav_path)\n",
    "#     audio_segment.export(mp3_path, format=\"mp3\", bitrate=\"64k\")  # Low bitrate = small file\n",
    "#     return mp3_path"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "toBIPBJoSNw0"
   },
   "outputs": [],
   "source": [
    "# Handles audio input as numpy array and returns updated chat history\n",
    "def speak_send(audio_np):\n",
    "\n",
    "    # If use numpy as input: audio_input = gr.Audio(sources=\"upload\", type=\"numpy\", label=\"Upload audio file to generate meeting minutes\")\n",
    "    # mp3_path = save_as_mp3(audio_np)\n",
    "\n",
    "    # with open(mp3_path, \"rb\") as audio_file:\n",
    "    #     transcription = openai.audio.transcriptions.create(\n",
    "    #         model=AUDIO_MODEL,\n",
    "    #         file=audio_file,\n",
    "    #         response_format=\"text\"\n",
    "    #     )\n",
    "\n",
    "    audio = AudioSegment.from_file(audio_np)\n",
    "    with tempfile.NamedTemporaryFile(suffix=\".mp3\", delete=False) as tmpfile:\n",
    "        audio.export(tmpfile.name, format=\"mp3\")\n",
    "        with open(tmpfile.name, \"rb\") as file:\n",
    "            transcript = openai.audio.transcriptions.create(\n",
    "                model=AUDIO_MODEL,\n",
    "                file=file,\n",
    "                response_format=\"text\"\n",
    "            )\n",
    "\n",
    "    system_message = \"You are an assistant that produces minutes of meetings from transcripts, with summary, key discussion points, takeaways and action items with owners, in markdown.\"\n",
    "    user_prompt = f\"Below is an extract transcript of a Denver council meeting. Please write minutes in markdown, including a summary with attendees, location and date; discussion points; takeaways; and action items with owners.\\n{transcription}\"\n",
    "\n",
    "    messages = [\n",
    "        {\"role\": \"system\", \"content\": system_message},\n",
    "        {\"role\": \"user\", \"content\": user_prompt}\n",
    "      ]\n",
    "\n",
    "    inputs = tokenizer.apply_chat_template(messages, return_tensors=\"pt\").to(\"cuda\")\n",
    "    outputs = model.generate(inputs, max_new_tokens=2000)\n",
    "\n",
    "    _, _, after = tokenizer.decode(outputs[0]).partition(\"assistant<|end_header_id|>\")\n",
    "    return after.strip()\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "xXJfabpDSN5R"
   },
   "outputs": [],
   "source": [
    "with gr.Blocks() as demo:\n",
    "\n",
    "    with gr.Row():\n",
    "        audio_input = gr.Audio(sources=\"upload\", type=\"filepath\", label=\"Upload audio file to generate meeting minutes\")\n",
    "    with gr.Row():\n",
    "        audio_submit = gr.Button(\"Send\")\n",
    "    with gr.Row():\n",
    "      outputs = [gr.Markdown(label=\"Meeting minutes:\")]\n",
    "\n",
    "    audio_submit.click(speak_send, inputs=audio_input, outputs=outputs)\n",
    "\n",
    "demo.launch(debug=True)\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "kuxYecT2QDQ9"
   },
   "source": [
    "# Student contribution\n",
    "\n",
    "Student Emad S. has made this powerful variation that uses `TextIteratorStreamer` to stream back results into a Gradio UI, and takes advantage of background threads for performance! I'm sharing it here if you'd like to take a look at some very interesting work. Thank you, Emad!\n",
    "\n",
    "https://colab.research.google.com/drive/1Ja5zyniyJo5y8s1LKeCTSkB2xyDPOt6D"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "AU3uAEyU3a-o"
   },
   "source": [
    "## Alternative implementation\n",
    "\n",
    "Class student Youssef has contributed this variation in which we use an open-source model to transcribe the meeting Audio.\n",
    "\n",
    "Thank you Youssef!"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "phYYgAbBRvu5"
   },
   "outputs": [],
   "source": [
    "import torch\n",
    "from transformers import AutoModelForSpeechSeq2Seq, AutoProcessor, pipeline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "HdQnWEzW3lzP"
   },
   "outputs": [],
   "source": [
    "AUDIO_MODEL = \"openai/whisper-medium\"\n",
    "speech_model = AutoModelForSpeechSeq2Seq.from_pretrained(AUDIO_MODEL, torch_dtype=torch.float16, low_cpu_mem_usage=True, use_safetensors=True)\n",
    "speech_model.to('cuda')\n",
    "processor = AutoProcessor.from_pretrained(AUDIO_MODEL)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "ZhA_fbeCSAeZ"
   },
   "outputs": [],
   "source": [
    "pipe = pipeline(\n",
    "    \"automatic-speech-recognition\",\n",
    "    model=speech_model,\n",
    "    tokenizer=processor.tokenizer,\n",
    "    feature_extractor=processor.feature_extractor,\n",
    "    torch_dtype=torch.float16,\n",
    "    device='cuda',\n",
    "    return_timestamps=True\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "nrQjKtD53omJ"
   },
   "outputs": [],
   "source": [
    "# Use the Whisper OpenAI model to convert the Audio to Text\n",
    "result = pipe(audio_filename)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "id": "G_XSljOY3tDf"
   },
   "outputs": [],
   "source": [
    "transcription = result[\"text\"]\n",
    "print(transcription)"
   ]
  }
 ],
 "metadata": {
  "accelerator": "GPU",
  "colab": {
   "gpuType": "T4",
   "provenance": []
  },
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.13"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
